A SAS / AF Application for Parallel Extraction , Transformation , and Scoring of a Very Large Database
نویسندگان
چکیده
This paper describes a SAS/AF application to extract large volumes of data in parallel from a multiple-terabyte RDBMS and directly populate a parallel SAS data mart. During population, the application allows the user to perform CPUintensive data transformation/normalization operations in parallel. This application also allows models generated by Enterprise Miner software to be deployed in parallel to score the entire data mart, or subsets of it. Parallel execution is achieved using Torrent Systems’ Orchestrate application development and runtime environment, which allows the application to • extract data in parallel from a parallel RDBMS • load the results of SAS programs back into the database in parallel • process parallel data streams with parallel instances of a SAS DATA or PROC step for much higher throughput • store large data sets in parallel, providing faster access and eliminating storage restrictions • stream data between SAS steps without having to write intermediate results to disk. The performance benefits of executing SAS extracts and other processes in parallel are well documented. In both production and test environments, parallel processing has allowed SAS applications to process larger workloads more quicklytypically improving performance by a factor equal to the number of processors used. These applications typically show near linear scalability. (An example of linear scalability is where a12-processor system provides 12 times the performance of a single processor.) These results are documented in the IBM Whitepaper, “Achieving Scalable Performance for Large SAS Applications and Database Extracts.”
منابع مشابه
Parallelization of Rich Models for Steganalysis of Digital Images using a CUDA-based Approach
There are several different methods to make an efficient strategy for steganalysis of digital images. A very powerful method in this area is rich model consisting of a large number of diverse sub-models in both spatial and transform domain that should be utilized. However, the extraction of a various types of features from an image is so time consuming in some steps, especially for training pha...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملSolving the Problem of Scheduling Unrelated Parallel Machines with Limited Access to Jobs
Nowadays, by successful application of on time production concept in other concepts like production management and storage, the need to complete the processing of jobs in their delivery time is considered a key issue in industrial environments. Unrelated parallel machines scheduling is a general mood of classic problems of parallel machines. In some of the applications of unrelated parallel mac...
متن کاملHow I Converted a Batch Application System to Client-Server and Lived to Tell About It
After years of using mainframe-based batch applications, though they were user-friendly, my users wanted something GUI, something that runs on their desktop, and something that interchanges with their MS-Windows applications. So, I set out to convert my batch application system to something my users wanted. Armed with the SAS System®, Base SAS, SAS/CONNECT®, SAS/AF®, SAS/FSP®, a TCP/IP connecti...
متن کاملFinite Horizon Economic Lot and Delivery Scheduling Problem: Flexible Flow Lines with Unrelated Parallel Machines and Sequence Dependent Setups
This paper considers the economic lot and delivery scheduling problem in a two-echelon supply chains, where a single supplier produces multiple components on a flexible flow line (FFL) and delivers them directly to an assembly facility (AF). The objective is to determine a cyclic schedule that minimizes the sum of transportation, setup and inventory holding costs per unit time without shortage....
متن کامل